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Abstract — We consider a sequential problem in decentralized 
detection. Two observers can make repeated noisy observations 
of a binary hypothesis on the state of the environment. At 
any time, any of the two observers can stop and send a final 
message to the other observer or it may continue to take more 
measurements. After an observer has sent its final message, 
it stops operating. The other observer is then faced with a 
different stopping problem. At each time instant, it can decide 
either to stop and declare a final decision on the hypothesis or 
take another measurement. At each time, the system incurs an 
operating cost depending on the number of observers that are 
active at that time. A terminal cost that measures the accuracy 
of the final decision is incurred at the end. We show that, 
unlike in other sequential detection problems, stopping rules 
characterized by two thresholds on an observer's posterior 
belief no longer guarantee optimality in this problem. Thus 
the potential for signaling among observers alters the nature of 
optimal policies. We obtain a new parametric characterization 
of optimal policies for this problem. 

I. Introduction 

Decentralized detection problems are motivated by appli- 
cations in large scale decentralized systems such as sensor 
networks, power systems and surveillance networks. In such 
networks, sensors receive different information about the 
environment but share a common objective, for example to 
decide if a fault has occurred or not in a power system, or to 
detect the presence of a target in a surveillance area. Sensors 
may be allowed to communicate but they are constrained 
to exchange only a limited amount of information because 
of energy constraints, data storage and data processing con- 
straints, communication constraints etc. 

Decentralized detection problems may be static or se- 
quential. In static problems, sensors make a fixed number 
of observations about a hypothesis on the state of the 
environment which is modeled as a random variable H. 
Sensors may transmit a single message (a quantized version 
of their observations) to a fusion center which makes a final 
decision on H . Such problems have been extensively studied 
since their initial formulation in [1] (See the surveys in [2], 
[3] and references therein). In most such formulations, it has 
been shown that person-by-person optimal decision rules (as 
defined in [4]) for a binary hypothesis detection problem 
are characterized by thresholds on the likelihood ratio (or 
equivalently on the posterior belief on the hypothesis). 

In sequential problems, the number of observations taken 
by the sensors is not fixed a priori. In the centralized 
sequential detection problems, as formulated in [5], a sensor 
can sequentially make costly observations and, after each 
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observation, can choose whether to stop and declare its 
final decision on H or to take more observations. In the 
decentralized analogue of the sequential detection problem, 
two (or more) sensors locally decide when to stop taking 
more measurements and then make a final decision on H. 
Each sensor pays a penalty for delaying its final decision 
and a terminal cost that depends on the final decisions of 
all the sensors and the true value of H is incurred at the 
end. A version of this problem (called the decentralized 
Wald problem) was formulated in [6] and it was shown 
that at each time instant, optimal policies for the sensors 
are described by two thresholds. The computation of these 
thresholds requires solution of two coupled sets of dynamic 
programming equations. Similar results were obtained in a 
continuous time setting in [7]. 

A key feature of the decentralized Wald problem is that 
the individual sensors do not communicate their decisions to 
each other. That is, the i th sensor is not aware of decisions 
of other sensors. This implies that if policies of all other 
sensors are fixed, the i th sensor is faced with a classical 
sequential detection problem for which two-threshold poli- 
cies are optimal. In the problem we consider in this paper, 
each sensor observes the other sensor's decisions. Hence, in 
addition to its own measurements of H, the i th sensor can 
use the decisions made by other sensors (whether they have 
stopped or not and whether the final decision was or 1) 
to make its decisions. The final decision of the sensor that 
stops in the end is taken as the final decision made by the 
group of sensors. Thus, sensors can convey information to 
each other through their decisions. The presence of signaling 
among sensors implies that, even if all other sensors have 
fixed their strategies, the problem for i th sensor is no 
longer a classical sequential detection problem. We show 
that, for this problem, the classical two-threshold policies 
no longer guarantee optimality. We obtain an alternative 
parametric characterization of the optimal policies of the 
sensors. A related sequential detection problem with one- 
way communication was presented in [8]. 

The rest of the paper is organized as follows. In Section [il] 
we formulate our problem with two observers. We present 
the information states for the observers in Section [III] A 
counterexample that shows that classical two-thresholds are 



not necessarily optimal is presented in Section IV We 



derive a parametric characterization of optimal policies in 
Section [V] We conclude in Section [Vll 

Notation: Throughout this paper, Xi :t refers to the se- 
quence Xi,X2, --jXt- Subscripts are used as time index 
and the superscripts are used as the index of the sensor. 
We use capital letters to denote random variable and the 



corresponding lower case letters for their realizations. 



min{t : U 2 ^ b} 



II. Problem formulation 

Consider a binary hypothesis problem where the true 
hypothesis is modeled as a random variable H taking values 
or 1 with known prior probabilities. 

P(H = Q)=p ; P(H = l) = l-p 

Consider two observers: observer 1 (Ol) and observer 2 
(02). We assume that each observer can make noisy obser- 
vations of the true hypothesis. Conditioned on the hypothesis 
H, the following statements are assumed to be true: 

1. The observation of the i th observer at time t, (Y t l ) 
(taking values in the set y l ), either has a discrete distribution 
(Pf(.\H)) or admits a probability density function (fl(.\H)). 

2. Observations of the i th observer at different time instants 
are conditionally independent given H. 

3. The observation sequences at the two observers are 
conditionally independent given H. 

Observer i (i — 1,2) observes the measurement process 




Observer 1 



Observer 2 



Fig. 1. Decentralized Detection 

Y t l (t = 1,2,...). If no observer has stopped before time t, 
then at time t, any observer can decide either to stop and 
send a binary message or 1 to the other observer or to 
postpone its decision and get another measurement. After an 
observer has sent its final message, it stops operating. The 
other observer (the one which has not yet stopped) is then 
faced with a different stopping problem. At each time instant, 
it can decide either to stop and declare a final decision on 
the hypothesis or take another measurement. 

We denote by XJ\ the decision of the i th observer at time 
t. U\ belongs to the set {0, 1,6}, where b denotes a blank, 
that is, no message or no final decision. At time t, the i th 
observer makes its decision based on its observations till time 
t and the messages (blanks or otherwise) exchanged between 
the two observers till time t — 1. We have, 

^=7i(i?:t.^iVi^iVi) (1) 

where the collection of functions T* := (7J, t = 1,2,...) 
constitute the policy of the i fh observer. We define the 
following stopping times: 

r 1 := min{t : XJ\ ^ b} 



After the time r\ all future observations and decisions of the 
i th observer are assumed to be in the empty set (0), that is, 
the observer stops taking measurements or making decisions. 
We assume that the final decision on the hypothesis must be 
made no later than a finite horizon T, hence we have that 
t 1 <T and t 2 < T. We define, r mm := minir 1 ^ 2 }, 

T max .__ TOaa ,j T l^ T 2j an( j 
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system incurs a total cost given by: 
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where K > k > are constants and </(•,•) lS a non- 
negative distortion function with J(0, 0) = J(l, 1) = 0. 
(The superscript I" 1 ,!" 2 over the expectation denotes that 
the expectation is with respect to a measure that depends 
on the choice of the policies T 1 ,! 12 .) The first term in the 
objective represents the operating costs when both observers 
are active, the second term represents the operating cost of 
only one observer. The operating costs incorporate the cost of 
taking a new measurement, the energy cost of staying on for 
another time step and/or a penalty for delaying the decision. 
The last term in the objective represents the accuracy of the 
final decision. Note that only the decision of the observer 
that stops later is considered as the final decision. In case 
of simultaneous decisions, only observer 2's decision is 
considered as the final decision. We can now formulate the 
optimization problem as follows: 

Problem P: Given the statistics of the binary hypothesis 
and the observation processes, the cost parameters K, k, the 
distortion function J(-, •) and a time horizon T, the objective 
is to select policies I" 1 ,!" 2 that minimize the total expected 
cost in Q, 

III. Information States 

In this section, we identify information states for the 
two observers. We start by fixing the policy of observer 1 
to an arbitrary choice and finding a sufficient statistic for 
observer 2. The nature of this sufficient statistic does not 
depend on the arbitrary choice of observer l's policy. This 
sufficient statistic is the information state for observer 2. 

Consider a fixed policy T 1 = (71,72, •■•,7t) f° r d- At 
any time t, we define the following: 

Definition 1: Given a fixed policy T 1 of observer 1 and 
functions 7 2 t _i, we define observer 2's belief on the hy- 
pothesis given all its information at time t. 

IT 2 := P(H = 0\Y 2 t , Ul t _ x , £&_! = 6i rt _i), 

where &i : t_i denotes a sequence of blank messages from 
time 1 to t — 1. For t = 0, we have IIq = po- 



We will show that the pair (Il|,lr T i <t i) is the information 
state for observer 2. We first describe the evolution of 
(II 2 , l{ T i <t }) in time in the following lemma. 



Lemma 1: (i) lr T i <t+1 i 



i 



{ri<t} + l{C/ t V>} 



(ii) With observer l's policy fixed to r , II 2 evolves as 
follows: 

2 f / t+1 (n 2 ,y t 2 +1 ) ifi {Tl<f} = i 
'+ 1 \ «7 t+1 (n?,r t 2 +1 ,£tf) ifi {Tl<f} = 0' 

where / t+ i and g t +\ are deterministic functions. 
Proof: See Appendix II] ■ 

The optimal policy for observer 2 (for the given choice of 
r 1 ) can be obtained by means of a dynamic program. We 
now define the value functions of the dynamic program. 



Definition 2 
define 



(i) For 7T e [0, 1] and a £ {0, 1}, we 
V T (n,a) 



mm{E[J(0,F)|n| = tt], 

E[J(l,ff)|I^=7T]} 

(ii) For it G [0, 1] and t = T - 1, T - 2, . . . , 1, we define 

V t (ir,l) 
= min{E[J{0, H)\U 2 t = tt], E[J(1, iJ)|n 2 = tt], 

* + E[^ t+1 (n 2 +1 , i)|n 2 = tt, i {T i< t }=i]} 

(iii) For tt e [0, 1] and t = T - 1, T - 2, . . . , 1, we define 
Vt(7r, 0) in equation ([3]) at the bottom of the page. 

Theorem 1: With a fixed policy T 1 of observer 1, there is 
an optimal policy for observer 2 of the form: 

c/ t 2 = 7t 2 (n 2 ,i {Tl<t} ) 

for t — 1,2, ...,T. Moreover, this optimal policy can be 
obtained by the dynamic program described by the value 
functions in Definition [2] Thus, at time t and for a given 
tt and a, the optimal decision is (or \lb) if the first 
(or second/third) term is the minimum in the definition of 
V t {-K,a). 

Proof: See Appendix [Tlj ■ 

The above arguments can be repeated by interchanging the 
roles of observer 1 and observer 2 to conclude that for a fixed 
policy T 2 of observer 2, an optimal policy for observer 1 is 
of the form: 

E£ = <tf(nJ,i {Ta<t} ) 

where Ii* := P 1 " (H = U\Yl t M.t-iM.t-i = &i*-i). 
Note, however, that the actual dynamic program for ob- 
server 1 will differ from that of Theorem Q] because of the 



asymmetry in the objective function of equation pi when 
both observers make simultaneous decisions to stop. While 
the value functions Vr and Vt{ir, 1) in the dynamic program 
for observer 1 will the be same as in Definition 2, the value 
function Vt(7T,0) is given in equation (H at the bottom of 
the next page. 

IV. A Counterexample 

In the sequential detection problem with a single observer 
[5] , it is well known that an optimal policy is a function 
of the observer's posterior belief H t and is described by two 
thresholds at each time t. That is the decision at time t, Z t , 
is given as: 

1 if II* < a t 

Z t = I b if a t < Il t < p t 
ifn t >/3 t 

where b denotes a decision to continue taking measurement 
and at < f3 t are real numbers in [0,1]. A similar two- 
threshold structure of optimal policies was also established 
for the decentralized Wald problem in [6]. We will show 
by means of a counterexample that such a structure is not 
necessarily optimal in our problem. 

Consider the following instance of Problem P. We have 
equal prior on H, that is P(H = 0) = P(H = 1) = 1/2 and 
a time horizon of T = 3. Assume k = 1 and 1 < K < 2. 
The observation space of observer 1 is y 1 = {0, 1} and 
the observations at time t obey the following conditional 
probabilities: 

Observation 1 

P(-\H = 0) q t (1-fc) 

P(-\H=1) (l-q t ) q t 

where qi = q% = 1/2 and (73 = 1. Thus, the first 
two observations of observer 1 reveal no information about 
H while the third observation reveals H noiselessly. The 
observation space of observer 2 is J^ 1 = {0,1,2} and 
the observations at time t obey the following conditional 
probabilities: 

Observation 12 

P{-\H = 0) r t (l-r t ) 
P{-\H=1) (l-r t ) r t 

where r^ — r^ = and < r\ < 1. Thus, the second and 
third observations of observer 2 reveal no information about 
H. Note that under this statistical model of observations, 
there exists a choice of policies such that the system makes 
perfect final decision on the hypothesis and incurs only 
operational costs (if observer 2 stops at t — 1 and observer 1 



Vtfr, 0) = mm{E[l {T i =t} J(0, H) + l {r i >i} (fc^ 1 - t) + J(U^ , H))\W 2 t = tt, l {r i <t}=0 , C/ 2 = 0], 
E[t {T ^ t} J(l,H) +l {Tl>t} (k(T 1 -t) + J(U^,H))\Il 2 t =7V,t {T , <t} ^ ,U^ = 1], 

ni {T i =t} (k+v t+1 (u 2 t+1 ,i)) 

+ t {T i >t} (K + V t+1 (Ii 2 t+11 0))|n 2 = tt, l {T i <t}=0) u? = b]} (3) 



waits till time t = 3, then it can make a perfect decision on 
H and the system incurs an operational cost of 2fc = 2). 
We assume that the cost of a mistake in the final decision 
(U^ y^ H) is sufficiently high so that any choice of 
policies that makes a mistake in the final decision with non- 
zero probability will have a performance worse than 2. Thus, 
any choice of policies that makes a mistake in the final 
decision with non-zero probability cannot be optimal. 

Under the above instance of our problem, if observer 2 is 
restricted to use a two-threshold rule at time t = 1, then the 
lowest achievable value of the objective is given as: 



min[{n + (1 - n)(K + 1)}, {2 - n/2}] 



(5) 



where the first term corresponds to the case when -ff is given 

as: 

[ i if nf = 

Uf = I b if < nf < 1 (6) 

[ o if nf = i 

and the second term corresponds to -ff being 

1 if nf < 1 
if nf = 1 



uf 



(7) 



Other choices of thresholds for observer 2 at time t = 1 do 
not give a lower value than the expression in d5j. 
Consider now the following choice of 7j'*: 

[ i if nf = o 

Ul = l if < nf < 1 (8) 

{ b if nf = i 

The lowest achievable expected cost under the above choice 
of ~/ 2 is J* = 2(1 - n) + r x {K + l)/2. It is easy to check 
that for 1 < K < 2 and r x < 2/3, 

J* < min[{n + (l - n)(K + 1)}, {2 - n/2}] 

Thus, 7 X '* outperforms the two-threshold rules. 

Discussion: In the above example, observer 1 can always 
make the correct decision at time t = 3, However, this 
incurs additional operational costs. A good policy should 
try to enable the observers to make the correct decision 
before time t = 3, whenever possible. If observer 2 gets 
the observations or 2 at t = 1, then it is certain about the 
true hypothesis. Using the first threshold rule given in d6|, 
observer 2 is able to convey to observer 1 that it is certain 
about the true hypothesis and what this hypothesis is, thus 
preventing observer 1 from waiting till time t = 3 to make 
the final decision. However, in the case when observer 2 gets 
measurement 1, it sends a blank and postpones its decision 
to stop. This incurs additional operating costs for observer 2 



without providing any new information that may prevent 
observer 1 from delaying its decision to time t = 3. By 
making n small, the contribution of this term in the overall 
cost is increased. The second threshold (equation |7])) rule 
attempts to keep the operational costs of observer 2 small 
but does not always send enough information to observer 1 
to enable it to make a decision before t = 3 even when 
observer 1 knows the true value of H. The non-threshold 
rule (given in dS}), however, keeps the operational cost of 
observer 2 small (when r% is small) while at the same time 
ensuring that whenever observer 2 is certain about true value 
of H, the final decision is not postponed to time t = 3. 

V. Parametric Characterization of Optimal 
Policies 

An important advantage of the threshold rules in the case 
of the centralized or the decentralized Wald problem is that it 
modifies the problem of finding the globally optimal policies 
from a sequential functional optimization problem to a 
sequential parametric optimization problem. Even though we 
have established that a classical threshold rule does not hold 
for our problem, it is still possible to get a finite parametric 
characterization of optimal policies. Such a parametric char- 
acterization provides significant computational advantage in 
finding optimal policies, for example by reducing the search 
space for an optimal policy. 

In Theorem [T] we have established that for an arbitrarily 
fixed choice of observer l's policy, the optimal policy for 
observer 2 can be determined by a dynamic program using 
the value functions Vt(x,a),t = T, ...,2, 1. We have the 
following lemma. 

Lemma 2: With a fixed (but arbitrary) choice of T 1 , the 
value function at T can be expressed as: 



Vr(7r,a) := min{l°(ir), Z 1 (tt)} 



(9) 



where 1° and I 1 are affine functions of tt. Also, the value 
functions at time t can be expressed as: 



Vt(7r,l):=min{l°(TT),l 1 (ir),G t (w)} 



where G+ is a concave function of n, and 



V t (7r,0):=min{L t (Tr),L 1 t (n),H t (w)} 



(10) 



(ID 



where L® and L\ are affine functions of 7r and H t is a 
concave function of it (the actual form of these functions 
depends on the choice of F 1 ). 

Proof: See Appendix |nT| ■ 



V t (7T,0) = mm{E[fc(r 2 - t) + J {U 2 T , , H)\Ii\ - tt, l {r 2 <t}=0 , U] = 0], 
E[fc(r 2 -t) + J(U 2 t2 ,H)\U\ = ir,t {T2<t}=0 ,U} = 1], 
E[l {Ta=t} (k + Vt+i(I$ +1 ,l)) 

+ t {T 2 >t} (K + Vt + i(nJ +1 ,o))|nJ - tt, i {r2<t}=0 , u} = b]} (4) 



Theorem 2: For any fixed policy T 1 of observer 1, an 
optimal policy for observer 2 can be characterized as follows: 



U 2 



if Ii 2 T < a T 
if n| > a T 



where < a T < 1. For t = 1, 2, .., T 1 - 1, if t, T 



{ri<t} 



= 1, 



1 ifnf<a 4 (l) 
17? = ^ & if a*(l) < nf < &(1) 

o ifnf>A(i) 



where < a t (l) < ft(l) < 1, and if l {r : 



<t} 



0, 



u? = 



b ifnf<a t (0) 



ifa t (0)<nf <ft(0) 

if A(o)<nf <8t(0) 
if <y t (o) < nf < e t (o) 
ifnf > 



Mo) 



where < a t (0) < ft(0) < <5 t (0) < 9 t (0) < 1. 

Proof: From lemma|2] we know that the value functions 
can be written as minimum of affine and concave functions. 
Since taking minimum of two straight lines and a concave 
function can partition the interval [0, 1] into at most five 
intervals, this gives a four threshold characterization of 
optimal policy where the thresholds signify the boundaries of 
these intervals. At time T or when 1i r i <t \ = 1, observer 2's 
decision of or 1 is the final decision (U^ ) on H. In these 
cases, if observer 2 is certain about H (that is its belief is 
or 1), then it should clearly choose the correct value of H. 
This fact reduces the number of thresholds for time T and 
when l{ r i <t } = 1. ■ 

Discussion: It is instructive to compare our problem with 
the decentralized Wald problem studied in [6], Both problems 
involve two observers that make repeated observations of H 
and decide when to stop. Unlike the problem in this paper, 
in the decentralized Wald problem the sensors do not have 
access to each other's decisions, that is, an observer's policy 
is restricted to be of the form: 

^ = 7JC^ t ,tfL-i) 

The optimality of classical two-threshold rules for the decen- 
tralized Wald problem was established in [6]. In this paper, 
we allowed each observer to observe other's decisions and 
showed that the two threshold rules are no longer optimal. 

Both the decentralized Wald problem and the problem for- 
mulated in this paper are team problems . That is, they involve 
more than one decision maker with a common objective. 
However, in the decentralized Wald problem, a decision- 
maker's decisions do not influence the information available 
to other decision-makers. This is the essential criterion for 
static team problems [9]. The problem formulated in this 
paper is a dynamic team problem since a decision-maker's 
past decisions are a part of the information available to 
other decision-makers. It is the dynamic aspect of this team 
problem that allows for signaling between decision-makers 
and changes the nature of optimal policies. 



VI. Conclusions 

We considered a sequential problem in decentralized de- 
tection problem with signaling. Two observers make separate 
costly measurements of a binary hypothesis and decide when 
to stop. The observers can observe each other's decisions 
(whether the other observe has stopped or not and whether 
the final decision was or 1). The final decision of the 
observer that stops in the end is taken as the final decision 
made by the group. Thus, observers can convey information 
to each other through their decisions. We identified informa- 
tion states for the two observers and showed that classical 
two threshold rules no longer guarantee optimality. However, 
a finite parametric characterization of optimal policies is still 
possible. 

Appendix I 
Proof of Lemma 1 

Proof: Part (i) follows from definition of r 1 . 
In Part (ii), if r 1 < t, then by definition, we have 

D? := P(H = 0\Y 2 t , Ul t _ t , Ul t _, = 6 1:t _x) 

= P(H = 0\Y 1 2 t ,Ut Tl ) (12) 

where we removed redundant terms from the conditioning 
(terms which are constants or functions of other terms). 
Similarly, 

n 2 t+1 = P(H = 0\Y* t+1 ,Ul T .), 

which, on using Bayes' rule gives, 



n 2 - 



p{Y t \ 1 \H = m 2 t 



P(l? + i|H = 0)H 2 + PiY^H = 1)(1 - Hf) 

=: f t+1 (n 2 t+1 ,Y t \i) d3) 

If r 1 > t, then t r 1 1 t _ 1 = &i:i_i (that is all decisions of 
observer 1 are blanks till time t — 1) and 

H 2 := P(H = Q\Y^, Ul t _ x = 6 1:4 _ x> Ul t _ x = & 1:t _0 

Also, 

H 2 +1 := P(H = 0\Y? :t+1 , Ul t _ x = &!*_!, Ul Ul t = h:t) 
P(Y t 2 +1 ,U},H = 0|y2 t , C/^-i - but-iMt = &irt) 
J2 P{Y? + x,Ut,h\Yut, £AVi = 6i.t-i, Ult = but) 

h£{0,l} 

(14) 
The numerator in ( fl4| > can be written as: 

P(Y t 2 +1 \H = 0y 

{p(uI\h = o^t^Vi = &i*-i,E/? :t = M}n 2 (15) 

We now focus on the second term in ( fT5] >. 
Claim: Consider a realization u\, y\. t . Then, 

P(u\\H = 0,y 2 1:t , Ul t _, = W-.t-iM.t = bv.t) 
= P{u\\H = 0, CTiVi - &i:t-i, Ul t = h-.t) (16) 

Moreover, under the given choice of T 1 , the probability on 
the right hand side of ( 16 1 is a function only of u\ . 



Proof of claim: Using Bayes' rule, 

P(ul\H = 0,yl t , U\, t _ x = b lxt -i, Ul t = but) 

P[u\,H = 0, yl, t , Ul, t _ x = ftirt-i, Uj t = but) 
£ P^ 1 =u',H = 0, i£ t , C^Vi = h:t-i, Ul t = but) 

u' 

(17) 



Consider the joint probability in the numerator in ( 17 i 

P(u 1 t ,H = 0,yl t ,bi t _ 1) bl t ) 

where we use b\. t _ x ,bf. t _ 1 as shorthand notations for 
U 1 1 . t _ 1 = b\-.t-i and U x . t _ 1 = &i:t-i respectively. This 
probability can be further written as: 

= Y. p ^ H = ^yltA-.t-xA,uvl.t) 



V 

E 

V 



= y\P{u\\y\, u bi.t-i,bl.t-i) 



■pfl^ = &lvL,&l : t-i.6L-i) 

■P(y t 1 |if = 0)P( 2/t 2 |F = 0) 

t-i 

■n{ p (^ i= %i:fe' 6 ^-i' 6 ^-i) 



fc=l 



•^(^ 2 = %l: fc ^L-l^? :fe -l) 

■P(yi|J3 r = 0)P(v2|Jf = 0)}].po 
Rearranging the summation in ( |T~8] ), we get 

P((7 2 = % 2 :4 , 6} :t _ 1? 6 2 :t _ 1 )P(y 2 |7I = 0) • p 



(18) 



ri{ p (^ 



fc=l 



6|Wl:fc,^fc-l,6?:*-l)i'(»fc|J3 r = 0)} 

6? 



53[P(^ 1 | 2 / 1 1 :t ,6i :t _ 1 ,6 2 :t _ 1 )P(y t 1 |if = 0) 



t-1 



IlWt = b\ylk,bik-i,blk-i)P(Vk\H = 0)}] (19) 



fe=i 



Expressions similar to ( 19 1 hold for each term in the denom- 
inator of ( 17 1 and the terms outside the summation over y\. t 



cancel out in the numerator and the denominator. We note 



that the summation over y\. t in ( 19 1 does not depend on y\. t . 



Hence, the conditional probability in the left hand side of 
( 17 1 does not depend on y\. t . This establishes equation ( 16 1 



We also note that under the fixed policy T 1 of observer 1, the 
summation over y\. t in (19 1 is a function only of u\. Thus, 



the probability on the right hand side of ( [To} is a function 
only of u\. This concludes the proof of the claim. 

Using the result of the claim in ( p~5| > and using similar 
arguments for the denominator in (|14|, we get 

2 



n 



t+i 



p(y? +1 \h = o)P(f/ t 1 | J ff = o, &}*_!, &? !t )n 



p(y t 2 +1 |p = o)p(u}\h = o, &!*_!, &? !t )n? 
+p(y 4 2 +1 |p = i)p{u}\h = i, &**_!, &L)(i-n?) 

fff+1 (n 2 +1 ,y t 2 +1 ,t/ t 1 ) (20) 



Appendix II 
Proof Outline of Theorem 1 

We provide an outline of the proof of Theorem 1. The 
general idea is to show that at each time t, the value functions 
of Definition 2 represent the optimal future costs. Therefore, 
a policy that for each realization of IT 2 , l{ T i <4 } selects 
the minimizing term in the corresponding value function 
achieves the optimal cost. Thus, an optimal policy can be 
found that depends only on II 2 , l{ r i <t }. We start from time 
T. 

If the observer 2 is active at the terminal time T, it can 
only make one of two decisions: or 1. The expected future 
cost of choosing u £ {0, 1} for observer 2 is 

E[J{u,H)\Y^,Ui T _ x ,Ul.T-i = bur-i] 

= j{u, o)n| + j(u, i)(i - n|) 



= E[J{u,H)\n%] 



(21) 



Thus, the value function at time T is the minimum of the 
expected future costs incurred by choosing or 1. Hence, 
it represents the optimal expected future cost for observer 2 
at time T. Proceeding backwards, we assume that the value 
functions at time t+l,t + 2, . .. ,T represent optimal future 
costs at the respective times and consider two cases at each 
time t <T. 

Case A: r 1 < t If observer 1 has already stopped before 
t, then observer 2's stopping problem is the same as the 
centralized Wald problem and the value function Vt(7r, 1) is 
same as the value function in the dynamic program for the 
Wald problem. 

Case B: t 1 > t We now consider the case when observer 1 
has not stopped before time t. If observer 2 decides to stop 
and chooses u £ {0, 1} at time t, then the expected future 
cost will be 



E 



t {T i =t} J(u,H) 

+ t {T i >t} (k(T 1 -t) + J(U^,H)) 



Y 2 h 1 

I l:ti °l:t-l 
h 2 TJ 2 



(22) 



Claim: The expectation in (22 > is same as: 
E[t {r i =t} J(u,H)+ 

l {Tl>t} (fc(r 1 -i)+J(t/ T 1 1) F))|n 2 ,l {Tl<t} = 0,L?' 2 = u] 

(23) 

Proof of claim: For each realization y\. t of observer 2's 
observations, the expectation in ( |22| ) depends on the con- 
ditional distribution of the following random variables: 
H,t x ,U}.i given the realization of the random variables 
yi-ti b\. t _i,b 2 . t _ 1 , U 2 = u. Note that under the fixed policy 
r 1 of observer 1, r x ,C/i are functions of observer l's 
observation sequence Y X \ T and the terms b\. t _ 1 , bf. t _ 1 , XJ 2 — 
u fixed in the conditioning. Hence the conditional belief 

P(H, t\U 1 t1 \yltA:t-i,blt-u V 2 t = u) 
is a deterministic transformation of the belief 



P{H,Yl.T\vl.tA:t-i,bl t _ 1 ,U^ 



u). 



We will show that the above probability is same as 
P(H,Y^ T \^,l {T i <t} =0 ) U^=u) 



which is same as RHS of ( 28 1. Thus the probabilities in 



and hence the conditional expectation in (22 1 is same as 

E[t {r i =t} J(u,H)+ 

l {Tl>t} (fc(r 1 -t) + J(t/ r 1 1 , J ff))|7r 2 ,l {Tl<t}=0 ,C/ t 2 = M ] 

which corresponds to the first two terms in the minimization 
in 14(7r 2 ,0) in equation HI. 

Consider P(H = 0,y\, T \yl u 6j :t _ l5 6?. t _i, U 2 = u) 

= P{y\.. T \H = 0, yl, t , b\., t _ x , & 2 :4 _ x , U 2 = u)tt 2 (24) 

Similarly, 

P(H = 0, y 1:T |7r 2 , l {T i <t} = 0, Uf = u) 
= Pfcl-ff = 0, tt 2 , l {Tl<t} = 0, C/ 2 = W )tt 2 (25) 



We now compare the first terms in ( |24| > and ( |25) . Consider 
the first term in d24l), which can be written as 



(26) 



E 

V\:T 



P(H = 0, y\ :T , yl t ,bi t _ t , 6 2 :t -i, ^ 2 = «) 



The numerator can be written as: 

P(yl +1 .. T \H = 0)P(U 2 = tt|t/ 2 :t ,&L*-i,&L-i) 
■P(yl\H = 0)P(y?\H = 0) 

t-i 

•]l{P(UZ = b\yl h) blk-i,%*-i) 

•P(E£=&|v? !fc ,&i:k-i.&i:k-i) 

•P(^,|P^)P(2, 2 |P = /!)}• Po (27) 

Similar expressions hold for the denominator in po*) and the 
terms that depend on y\. t will cancel in the numerator and 
the denominator. Therefore, 



P(y{ :T \H = 0, y\, t , &!*_!, 6? :t -i, ^ = «) 
= P(yl i r|J3' = 0,6l it _ lJ ^ !t _ 1> J7?=«) 



(28) 



Now consider the first term in d25|l which can be written as 



P(yl T \H = 0,nll {T i <t} = 0,U? = 



E 

vl. t 



[P{y\, T \yl, u H = 0, tt 2 , l {Tl<t} = 0, Vf = u) 



P(2/ 2 :t |P = 0, tt 2 , l {T i <t} = 0, U? = u)] 



E 



^(^1^,^ = 0,^-1,^-1^1 = 



• P(2/ 2 :t |P = 0, tt 2 , l {r i <t} = 0, U 2 t = u)] (29) 

The first term inside the summation in ([29l is same as LHS 



of ((28). Using g8J in ((29) gives 

][P(yl. T \H = 0,bl t _ 1 ,bl t _ 1 ,U? = u) 



Ei 

Vf=t 



• P(2/ 2 :t |P = 0,^ 2 , l {T i <t} = 0, C/ 2 = u)] 

= PIj/^IP = 0, &!*_!,&?*_!, C/ t 2 = u) (30) 



RHS of (24 1 and d25) are equal. Similar conclusions hold 
for if = 1 in (24 1 and (25 i . This implies the equality of 



expectations and completes the proof of the claim. 

As a consequence of the claim, the first two terms in 
the minimization in the definition of V t (7r, 0) (equation (5) 
correspond to the expected future cost of choosing or 1 at 
time t. On the other hand, if observer 2 decides to continue at 
time t, then by the fact that value functions at t+ 1 represent 
the expected future costs at t + 1, we can write the expected 
future cost as: 



E[i {r i =t} (fc + y t+1 (n t+1 ,i)) 

+ t {T i >t} (K + V t+1 (U t+1 ,0)) 



Y 2 b 1 

I l:t' "1:4-1 
h 2 U 2 



(31) 



Using lemma 1 and the fact that under fixed policy T 1 
of observer 1, U} is a function of observer l's observation 
sequence Y^. t and the terms b\. t _ 1 , b\. t _ x fixed in the condi- 
tioning , one can conclude that for each realization of y\. t this 



expectation in ( 3 1 1 is a function of the following conditional 
probability: 



P&utX 



\v 2 b 1 h2 



h 2 JJ 2 

l,°l:t-l' u t 



h) 



Using arguments similar to those in the claim above, it can 
be shown that the above conditional probability is same as: 

P(Yl tl Y 2 +1 \^ 2 ,t {T , <t} =Q,U 2 = b) 

This shows that the third term in the minimization in the 
definition of Vt(n, 0) (equation (5) is the expected future cost 
of making a decision to continue at time t. Thus, Vt(II 2 , 0) 
is the minimum of the future costs incurred by choosing 0, 1 
or b. Hence, it represents the optimal future cost at time t, 
if observer 1 has not already stopped before time t. 

Appendix III 
Proof Outline of Lemma 2 

We define the following functions 

/"(tt) : = J(0, 0)vr + J(0, 1)(1 - tt) 

= E[J(0,P)|n 2 =tt] 
Z 1 ( 7 r): = J(l,0) 7 r + J(l,l)(l-7r) 

= E[J(l,P)|n 2 = 7 r] 



For the value function at time T, the result of the lemma 
follows from the definitions of Z°(7r),Z 1 (7r) and Vt(tt, a). 
Since, for each a € {0, 1}, Vt(tt, a) is the minimum of two 
affine functions of tt, it implies that, for each a e {0, 1}, 
Vt(ti", a) is a concave function of tt. Now, assume that 
Vj+i (tt, a) is concave in tt for each a € {0, 1}. The concavity 
of the value functions at time t + 1 implies that they can be 
written as infimum of affine functions of tt. In particular, we 
have 



14+1(71-, 1) = inf{a 4 7r + bi} 



(32) 



and 



V t+ i(n, 0) = inf{cj7r + d t ) 



(33) 



Now consider Vt(x, 1). The first two terms in the definition 
of Vt(ir, 1) are affine in it (see Definition 2). We need to 
show that the third term - 



■E[V t+1 (U 2 t+1 , 



Dinf 



7T,1 



, Jl{r 1 <t} = lJ 



(34) 



-is a concave function of it. From equation ( 13 i in the proof 



of lemma 1, we know that n t+ i can be written as: 

n2 = p(y t 2 +1 |g = o)n 2 

4+1 p(y 2 +1 \h = o)n 2 + p(y? +1 \h = i)(i - n?) 



p{Y t \ l \H = mi 
P(i? +1 |n 2 ) 



(35) 



Substituting ([35} in d34|) and evaluating the expectation gives: 



k + 2^p(y/ +1 = 2/|n^ = ir, i {T i <t} = i) 



y|tf = 0)tt 



V P(!? +1 = t/k) 



,1 



(36) 



Now using the characterization of Vt+iijT, 1) from (32i, we 
get 



^ P(r t 2 +1 =j;|g = 0)7r 

? iai I P(Y 2 +1 = yk) 

inf{ ai (P(r t 2 +1 = y\H = 0)n) 



bi} 



yey 



+ hP(Y t 2 +1 = y\H = 0)tt 

+ b t P(Y t 2 +1 =y\H = l)(l-ir)} 



(37) 



Each term in the summation over y E y is infimum of 
affine functions of 7r, hence each term in the summation is 
a concave function of ir. Thus, the third term of Vt(7T, 1) is 
a concave function of it. 

Next consider Vt(jv, 0) defined in (Bj. The conditional 
expectation for the first (or second) term in the minimization 
in RHS of <[3j is an affine function of the conditional proba- 
bility of the random variables H, Y^. T . Using arguments from 
Appendix III] this conditional probability can be written as: 



terms of RHS of ^ . The third term in Vt(n, 0) can be 
written as: 



E 



t {u ^ b} (k + V t+1 (g t+1 (iT,Y t 2 +11 Ul),l)) 
l {ui = b} (K + V t+1 (g t+l (n,Y t 2 +1 ,b),0)) 



1 



7T, 



uf 



{r 1 <t}=0; 



(40) 



Consider the first term in the summation in (40i. Evaluating 
the expectation, we get 

Yl ^KVbl^ + ^+i^t+i^'^+i'^ 1 )' 1 )) 
■ P(uly 2 +1 \U 2 t = n, l {Tl<t}=0 , U 2 = b)} (41) 



Using the characterization of g t +i from ( p"4} and (20i, the 
characterization of Vt+i(ir, 1) from (33 1 and arguments from 



Appendix II, ( |4"T| ) can be shown to be equal to 



E Ei 



+ inf{a i P(y 2 +1 |PT = 0)PK 1 |iJ = 0,6^-1.^)^ 
+ fe 4 P(y t 2 +1 |i? = 0)P(ul\H = 0, 6i :t _ a , 6 2 :t )7r 

+ ^P(y 2 +1 |ff = 1)P(U}|J3" - l,&l:t-l> &?*)(! - 7T)}] 

(42) 

which is concave in 7r (since it is infimum of affine functions 
in 7r). Similar arguments can be made for the second term 



P(yl T ,H ^Qlirlbl^bl^U? =0) 
= P(y 1 1 :T |if = 0,6l :t _ 1 ,& 2 :t -i.^ 2 =0)7r 



(38) 



and 



in (40 1 to conclude the concavity of third term in Vt{~K, 0). 
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P(v{ sT ,H=l\ 1 $,bl t _ 1 ,% lt _ 1 ,V? = 0) 

= P(y{ :T \H = 1, &L-i, &i:t-i. u t = 0)(1 - T 2 ) (39) 



Thus, 



the 



conditional 



probability 



P(Y^. T , H\tt 2 ,b{. t _ 1 ,bf. t _ 1 ,U 2 = 0) is an affine function 
of it 2 . This establishes the affine nature of the first two 



